Clicking noise detection in a digital audio signal

ABSTRACT

In a method (M) to detect a noise signal (PS 1,  PS 2,  PS 3 ) in a digital audio signal (EAS), it is provided that the audio signal (EAS) is divided into successive signal sections (SAS), and the energy contents of successive signal sections (SAS) are determined, and the energy contents of a signal section (SAS) are evaluated in relation to an energy threshold (ET), and that the occurrence of at least one high-energy signal section having an energy content above the energy threshold (ET), and the occurrence of at least one signal section (SAS) preceding the at least one high-energy signal section and having an energy content below the energy threshold (ET), and the occurrence of at least one signal section (SAS) following the at least one high-energy signal section and having an energy content below the energy threshold (ET) are detected, and that a quantity of signal sections (SAS) that precede the at least one high-energy signal section and a quantity of high-energy signal sections and a quantity of signal sections (SAS) that follow the high-energy signal section are counted.

The invention relates to a method for detecting a noise signal in adigital audio signal.

The invention further relates to a device for processing a digital audiosignal, which device is equipped with noise-signal detection meansdesigned to detect a noise signal in the audio signal.

The invention further relates to a computer program product, which issuitable for detecting a noise signal in a digital audio signal.

The invention further relates to a computer, which implements thecomputer program product in accordance with the previous paragraph.

A method of this kind, of the generic type mentioned above in the firstparagraph, and a device of this kind, of the generic type mentionedabove in the second paragraph, and a computer program product of thiskind, of the generic type mentioned above in the third paragraph, and acomputer of this kind, of the generic type mentioned above in the fourthparagraph have been put into circulation by the applicant in connectionwith a voice recognition system, and are therefore known.

In the known voice recognition system, spoken language in the form of anaudio signal is fed to the voice recognition system via a microphone,and digitized. The digital audio signal thereby obtained comprisesspeech signals to represent the voice, and background signals, whichrepresent background noise, and further noise signals, wherein the noisesignals may, in some circumstances, be similar to the speech signals andcould also occur in close proximity to them. This digital audio signalis subjected to a spectral analysis and to further processing, as aresult of which a representation of the digital audio signal in aso-called CEPSTRAL representation is obtained. The noise signals may beincorrectly detected as words that have not been spoken at all. Noisesignals in a digital audio signal further have the disadvantageouseffect that, on playback of an audio signal of this kind, a listener maybecome irritated. For this reason, the voice recognition system isequipped with noise-signal detection means, which are designed to detectnoise signals for the purpose of further treatment of these detectednoise signals.

In the known voice recognition system, the problem exists that noisesignals, especially clicking noise signals, that occur during arelatively short time span can be detected only within relatively longquiet pauses in which the audio signal does not represent a speechsignal, but only a background signal, as a result of which noise signalsthat occur in the immediate proximity or in the vicinity of speechsignals cannot be detected at all.

It is an object of the invention to eliminate the problem outlined abovein a method of the generic type mentioned above in the first paragraph,and in a device of the generic type mentioned above in the secondparagraph, and a computer program product of the generic type mentionedabove in the third paragraph, and a computer of the generic typementioned above in the fourth paragraph, and to create an improvedmethod and an improved device and an improved computer program productand an improved computer.

To achieve the object mentioned above, in a method in accordance withthe invention, features in accordance with the invention are provided sothat a method in accordance with the invention can be characterized inthe manner specified below, namely:

A method to detect a noise signal in a digital audio signal, wherein thedigital audio signal is divided into successive signal sections, andwherein the energy contents of successive signal sections aredetermined, and wherein the energy contents of a signal section areevaluated in relation to an energy threshold, and wherein the occurrenceof at least one high-energy signal section having an energy contentabove the energy threshold, and the occurrence of at least one signalsection preceding the at least one high-energy signal section and havingan energy content below the energy threshold, and the occurrence of atleast one signal section following the at least one high-energy signalsection and having an energy content below the energy threshold aredetected, and wherein a quantity of signal sections that precede the atleast one high-energy signal section and a quantity of high-energysignal sections and a quantity of signal sections that follow thehigh-energy signal section are counted.

To achieve the object mentioned above, in a device in accordance withthe invention, features in accordance with the invention are provided sothat a device in accordance with the invention can be characterized inthe manner specified below, namely:

A device to process a digital audio signal, which is equipped withnoise-signal detection means, which are designed to detect a noisesignal in the audio signal, wherein audio-signal subdivision means,which are designed to subdivide the audio signal into successive signalsections, are provided, and wherein energy-contents detection means,which are designed to determine the energy contents of successive signalsections, are provided, and wherein energy-contents evaluation means,which are designed to evaluate the energy contents of a signal sectionin relation to an energy threshold, are provided, and wherein occurrencedetection means, which are designed to detect the occurrence of at leastone high-energy signal section having an energy content above the energythreshold, and to detect the occurrence of at least one signal sectionpreceding the at least one high-energy signal section and having anenergy content below the energy threshold, and to detect the occurrenceof at least one signal section following the at least one high-energysignal section and having an energy content below the energy thresholdare provided, and wherein counting means, which are designed to count aquantity of signal sections that precede the at least one high-energysignal section and to count a quantity of high-energy signal sectionsand to count a quantity of signal sections that follow the at least onehigh-energy signal section, are provided.

To achieve the object mentioned above, in a computer program product inaccordance with the invention, the computer program product can beloaded directly into a memory of a computer, and comprises software codesections, wherein the method in accordance with the invention can beimplemented with the computer when the computer program product isimplemented on the computer.

To achieve the object mentioned above, in a computer in accordance withthe invention, the computer is equipped with a processor unit and aninternal memory, which implements the computer program product inaccordance with the paragraph quoted above.

By virtue of the provision of the measures in accordance with theinvention, the advantage is obtained that clicking noise signals can bedetected in the most reliable possible manner, and that this detectiontakes place on the basis of a representation of the audio signal interms of time, using an energy pattern established in thisrepresentation in terms of time and taking account of an existence ofthis energy pattern in terms of time, which has the result that complextransformation methods for transforming the representation of the audiosignal in terms of time into a representation other than arepresentation in terms of time may be completely dispensed with, andtherefore the invention can be realized with the availability of only arelatively low calculation power, and a fast and simple detection ofthese noise signals is assisted.

In a solution in accordance with the invention, it has further provedadvantageous if the features as claimed in claim 2 are provided. Thisgives rise to the advantage that an evaluation of the energy contents ofsignal sections in a standardized representation using decibels as theunit is relatively simple to perform.

In a solution in accordance with the invention, it has further provedadvantageous if the features as claimed in claim 3 are provided. Thisgives rise to the advantage that the energy threshold is determinedquasi-continuously and therefore always in the correct ratio to theactual signal level of the audio signal in each case, as a result ofwhich an incorrect detection, or no detection at all, of clicking noisesignals owing to an incorrect definition or approximation of the actualenergy threshold is virtually ruled out.

In a solution in accordance with the invention, it may, for example, beprovided that a duration of the signal sections is selected to bebetween 1 millisecond and 40 milliseconds. It has, however, provedespecially advantageous if the features as claimed in claim 4 areprovided, since, as a result, diverse properties of the audio signal canbe reacted to in a preferred value range in the most dynamic mannerpossible, i.e. by signal sections of varying lengths.

In a solution in accordance with the invention, it has proved especiallyadvantageous if the features as claimed in claim 5 are provided. Thisgives rise to the advantage that this uniform time resolution of theaudio signal assists a precise detection of a clicking noise signal.

In a solution in accordance with the invention, it has proved especiallyadvantageous if the features as claimed in claim 6 are provided. Thisgives rise to the advantage that an unambiguous detection of a clickingnoise signal in the audio signal, avoiding an incorrect detection ofuseful signals similar to it, is assured, since, in a useful signalexhibiting speech, a pause comprising n signal sections virtually doesnot occur within a word, and since an energy pattern comprising m and Isignal sections rules out an end of a spoken word, known as ahalf-syllable. In a solution in accordance with the invention, provisionmay also be made to establish whether 1 lies in the range between 1 and9, and to establish whether m is equal to or greater than a value fromthe range between 6 and 11, and to establish whether n is equal to orgreater than a value from the range between 27 and 38.

In a solution in accordance with the invention, it has further provedadvantageous if the features as claimed in claim 7 are provided. Thisgives rise to the advantage that even clicking noise signals that occurrepeatedly are distinguishable in a simple, reliable manner from usefulsignals in the audio signal, and therefore reliably detectable.

In a solution in accordance with the invention, it has further provedadvantageous if the features as claimed in claim 9 are provided. Thisgives rise to the advantage that noise signals can be removed from theaudio signal virtually in realtime, and therefore an audio signal freefrom noise signals can be made available.

The above-mentioned aspects and further aspects of the invention areexplained below.

The invention will be further described with reference to examples ofembodiments shown in the drawings, to which, however, the invention isnot restricted.

FIG. 1 shows, in a schematic manner, in the form of a block circuitdiagram, a device in accordance with a first embodiment example of theinvention.

FIG. 2 shows, in a manner analogous to FIG. 1, an invention-relevantdetail of the device in accordance with FIG. 1.

FIG. 3 shows, in the form of two diagrams, an audio signal exhibiting aclicking noise signal, which audio signal can be processed with the aidof the device in accordance with the invention, and a sequence of energycontents of the audio signal corresponding with signal sections of theaudio signal.

FIG. 4 shows, by analogy with FIG. 3, an audio signal exhibitingmultiple clicking noise signals, and a sequence of energy contents.

FIG. 5 shows, in the form of a diagram, a histogram of the energycontents of the audio signal in accordance with FIG. 3.

FIG. 6 shows, in the form of a block circuit diagram, a data processingsystem equipped with a computer in accordance with the invention, withwhich clicking noise signals are detectable in a digital audio signalwith the aid of a computer program product in accordance with theinvention.

FIG. 7 shows, in the form of a flowchart, a method in accordance withthe invention for detecting clicking noise signals in a digital audiosignal.

FIG. 1 shows a device 1 to process a digital audio signal DASI, whichdevice 1 is realized by a mobile dictation machine.

The digital audio signal DASI can be generated from an acoustic inputaudio signal ASI, which is shown in the upper diagram of FIG. 3 during afirst time range, wherein the amplitude A of audio signal ASI is shownas a function of time t. The audio signal ASI is formed by a speechsignal SP and a background signal BG occurring during a pause within thespeech signal SP, and a single noise signal PS occurring during arelatively short time span during the pause. A start of the pause ismarked by a time mark PB. An end of the pause is marked by a time markPE. A start of noise signal PS1 is marked by a time mark SB1. An end ofnoise signal PS1 is marked by a time mark SE1. In the upper diagram inFIG. 4 is shown the input audio signal ASI in a manner analogous to theupper diagram in FIG. 3, during a second time range. By contrast withthe first time range, following on from noise signal PS1, two furthernoise signals PS2 and PS3, which are similar in structure to noisesignal PS1, occur in the second time range. Noise signal PS2 isdelimited by time marks SB2 and SE2. Noise signal PS3 is delimited bytime marks SB3 and SE3. Noise signals PS1 or PS3 and PS3 respectivelyhave been generated on operation of the mobile dictation machine. Itshould be mentioned, however, that noise signals PS1, PS2 and PS3 ofthis kind can also be generated by events in the vicinity of thedictation machine. For reasons of scale, audio signal ASI is not shownin FIG. 3 and FIG. 4 for multiple time ranges I.

Device 1 is equipped with reception means 2, which is designed toreceive the input audio signal ASI. Reception means 2 is equipped with amicrophone, which is not shown in FIG. 1, and with a reception amplifierwith an automatic gain control, which is not shown in FIG. 1, and withan analog/digital converter, which is not shown in FIG. 1. The receptionmeans 2 is designed to generate and deliver a digital audio signal DASIrepresenting the input audio signal ASI, which digital audio signal DASIis present in a pulse-code modulation coding, PCM for short, in sixteenbit format.

Device 1 is further equipped with compression means 3, which is designedto receive the digital audio signal DASI and to generate and deliver acompressed audio signal CAS, which compressed audio signal CAS has adata volume that is reduced by comparison with digital audio signalDASI. In the present case, compression means 3 is designed to generate acompressed audio signal CAS, which audio signal CAS has, in the presentcase, been compressed in accordance with the “Code Excited LinearPrediction (CELP)” compression standard. It should, however, bementioned at this point that any other compression method may also beused, or that the digital audio signal DASI may be further processedcompression-free.

Compression means 3 is further designed for write access to firststorage means 4, which first storage means 4 is provided to store thecompressed audio signal CAS, so the compressed audio signal CAS can bestored in first storage means 4.

Device 1 is further equipped with decompression means 5, which isdesigned for read access to first storage means 4 and, during access tofirst storage means 4, to read compressed audio signal CAS stored infirst storage means 4. Decompression means 5 is further designed todecompress the compressed audio signal CAS and to generate and deliver adecompressed audio signal EAS.

Device 1 is further equipped with noise-signal detection means 6, whichis designed to receive the digital, decompressed audio signal EAS and todetect, in realtime, the noise signal PS1 or the noise-signal train PS1,PS2 and PS3 in audio signal EAS.

For this purpose, noise-signal detection means 6 is equipped withaudio-signal subdivision means 7, which is designed to subdivide audiosignal EAS into successive signal sections SAS, which signal sectionsSAS represent audio signal ASI for a time span P in each case. Amultiplicity of time spans P is drawn along time axis t in FIG. 3 and inFIG. 4. In the present case, time span P is selected to be fivemilliseconds. It should, however, be mentioned that other values canalso be selected for P, such as between two and ten milliseconds, which,however, as is clear to a person skilled in the art, could have aneffect on the quality of detection of the noise signals and/or an effecton other parameters influencing the detection of noise signals PS1 andPS2 or PS3 respectively, to which parameters we refer below in greaterdetail.

The noise-signal detection means 6 is further equipped withenergy-contents determination means 8, which is designed to determinethe energy contents of successive signal sections SAS, wherein theenergy contents of a signal section SAS are determined in accordancewith the formula$E = {10{\log_{10}\left( {\frac{1}{N}{\sum\limits_{k = 1}^{N}S_{k}^{2}}} \right)}}$in the unit decibels (dB), wherein Sk represents the signal amplitude ofthe k-th signal section SAS, and wherein N represents the total quantityof signal amplitudes S_(k) within signal section SAS. Theenergy-contents determination means 8 is further designed to generateand deliver energy-contents data EVD representing the determined energycontents. The determined energy contents of signal sections SAS areshown in the lower diagrams in FIG. 3 and FIG. 4 respectively, in theform of a bar chart in each case.

The noise-signal detection means 6 is further equipped withenergy-threshold determination means 9, which is designed to determinean energy threshold ET, wherein the energy threshold ET is determinedcontinuously on the basis of energy-content data EVD. In detail, thedetermination of energy threshold ET is based on a basic method and on arefinement method, both of which methods are described below in greaterdetail.

With the basic method, within a normal time slot of six seconds, throughwhich, metaphorically speaking, the energy-contents data EVD is pushed,a histogram H of the energy contents of all signal sections SASoccurring within the normal time slot is firstly created, as shown inFIG. 5. It should, however, be mentioned at this point that values otherthan the six seconds may also be provided for the normal time slot. Inhistogram H, the energy contents are plotted along the abscissa,wherein, in accordance with the selected PCM sixteen bit format, anenergy-contents top limit UB lies at 90 dB. A step-like characteristicof the edge curve thus obtained for the area of histogram G isapproximated by a continuously proceeding envelope curve EV. Below theenvelope curve, a low-energy area LEA and a high-energy area HEA aredefined in such a way that each of the areas exhibits ten percent of thearea below the envelope curve, wherein it should again be mentioned atthis point that, instead of ten percent, other values, such as valuesbetween five and fifteen percent, may also be provided. The positions ofthe respective delimitation lines of the two areas LEA and HEA give riseto noise-signal level NL and useful-signal level SL. The ratio betweenuseful-signal level SL and noise-signal level NL, referred to below asSNR, relating to the specialist expression “Signal-to-Noise Ratio”, iscalculated as the difference between the useful-signal level SL and thenoise-signal level NL. Further used is a parameter designated “NoiseOffset”, which takes account of an anticipated minimum energy bandwidthof the noise, and which, in the present case, exhibits a value of fourdecibels. A further parameter designated “Energy Factor” represents ananticipated noise component of the overall SNR range, and, in thepresent case has a value of 0.34. Using the above-mentioned parameters,the energy threshold ET can be calculated as follows, as a function ofthe condition applicable in each case:IF(SNR>Noise Offset)ET=NL+Energy Factor*SNRELSE ET=NL+Noise Offset

It should be mentioned at this point that, at the start of adetermination of the energy threshold ET, i.e. at an instant at which asufficient number of energy contents have not yet been determined withinthe normal time slot in order to determine the energy threshold ET, aminimum time slot of one second is used instead of the normal time slotin order to avoid a resultant significant delay in the determination ofenergy threshold ET using the normal time slot.

In a refinement method, on the assumption that, with the basic method,the duration of the normal time slot is too long to react to rapidchanges in noise-signal level NL, the noise-signal level NL isdetermined from the energy contents most recently determined within ashort time slot of one hundred milliseconds, wherein a mean value of theenergy contents is calculated in accordance with the formula${{NL} = {\frac{1}{M}{\sum\limits_{i}^{M}{EVD}_{i}}}},$wherein the energy-contents data EVD_(i) represents the energy contentswithin the short time slot, and wherein M is the quantity ofenergy-contents data EVD_(i) within the short time slot. In accordancewith the refined determination of the noise-signal level NL,determination of the energy threshold ET takes place as in the basicmethod, wherein the formulae specified in the basic method are used, andwherein SNR is determined in accordance with the basic method. Theenergy-threshold determination means 9 is further designed to deliverthe energy threshold ET determined in accordance with the basic methodor in accordance with the refinement method in the form ofenergy-threshold data ETD. The energy-threshold ET determined by theparticular method is entered in the lower diagrams in FIG. 3 and FIG. 4respectively, wherein changes in the energy threshold ET occurring overtime are not explicitly shown.

In determining the energy threshold ET, the refinement method is alwaysused, and the basic method is used, starting from time mark SE1, in thedirection of arrow T2 and, if applicable, also starting from time markSE2, in the direction of arrow T3, and, starting from time mark SE3, inthe direction of arrow T4, for, in each case, a maximum of thirty signalsections SAS, which represent a background signal BG, in order tostabilize the refinement method.

The noise-signal detection means 6 is further equipped with countingmeans 11. Counting means 11 is designed to count a quantity of adjacent,successive high-energy signal sections exhibiting an energy contentabove energy threshold ET, and to generate and deliver high-energynumerator-count data NCH representing this quantity. For the audiosignals AIS shown in FIG. 3 and FIG. 4, this situation exists betweentime marks SB1 and SE1, SB2 and SE2 and SB3 and SE3 respectively.

The counting means 11 is further designed to count a quantity of signalsections SAS, which precede the at least one high-energy signal section,and which exhibit an energy content below the energy threshold ET. Forthe audio signals ASI shown in FIG. 3 and FIG. 4, this situation existsstarting to the left of time mark SB1 and extending in the direction ofan arrow T1. Counting means 11 is further designed to count a quantityof signal sections, which follow the at least one high-energy signalsection, and which exhibit an energy content below the energy threshold.For the audio signals ASI shown in FIG. 3 and FIG. 4, this situationexists starting at time mark SE1 and extending in the direction of anarrow T2. For the audio signal ASI shown in FIG. 4, this situationfurther exists starting at time mark SE2 and extending in the directionof an arrow T3, and starting at time mark SE3 and extending in thedirection of an arrow T4. In both cases, i.e. in the case of signalsections SAS preceding a high-energy signal section or in the case ofsignal sections SAS following a high-energy signal section, the relevantquantity of signal sections SAS is represented physically by low-energynumerator-count data NCL, which, however, can be divided logically, i.e.as regards their occurrence in terms of time, into numerator-count dataNCL_(PRE) and NCL_(POST).

Accordingly, counting means 11 is realized in an advantageous manner byonly two numerators, which are not shown in FIG. 2, wherein a firstnumerator is provided to generate the low-energy numerator-count dataNCL, and wherein a second numerator is provided to generate thehigh-energy numerator-count data NCH, and wherein the counting means 11is designed to receive and to process a numerator signal NE, whichserves to communicate to counting means 11 which of the two numeratorsis to be incremented. The numerator-count data NCH or NCL present at therespective numerators is permanently available. Counting means 11 isfurther designed to receive a numerator-reset signal NR, which effectsan erasure of the numerator status represented by the numerator-countdata NCH and NCL.

The noise-signal detection means 6 is further equipped withenergy-contents evaluation means 12, which is designed to receive theenergy-contents data EVD and the energy-threshold data ETD determined ineach case, and which is designed to evaluate in each case the energycontents of a signal section SAS referred to the applicable energythreshold ET. The energy-contents evaluation means 12 is provided tointeract with occurrence-detection means 13. The occurrence-detectionmeans 13 is designed to generate and to deliver the numerator signal NEand the numerator-reset signal NR. The occurrence-detection means 13 isfurther designed to detect, with the aid of an evaluation result fromenergy-contents evaluation means 12, the occurrence of at least onehigh-energy signal section, such as between time mark SB1 and SE1, andto detect the occurrence of at least one signal section SAS precedingthe at least one high-energy signal section and exhibiting an energycontent below the energy threshold ET, such as to the left of time markSB1, and to detect the occurrence of a signal section SAS following theat least one high-energy signal section and exhibiting an energy contentbelow the energy threshold ET, such as to the right of time mark SE1.

The occurrence-detection means 13 is further designed to generate and todeliver occurrence-detection data RD in the event that the occurrence ofthe energy pattern described in the preceding paragraph has beendetected in signal sections SAS and the quantity of groups of signalsections SAS forming the energy pattern in each case corresponds to ahypothesis, so a clicking noise signal is present. Here, it isestablished during checking of the hypothesis whether the energycontents of m successive signal sections SAS, which are represented bylow-energy numerator-count data NCL_(PRE) and which precede thehigh-energy signal sections, fall below the energy threshold ET, whereinm is equal to or greater than nine. It is further established whetherthe energy contents of 1 successive high-energy signal sections, whichare represented by high-energy numerator-count data NCH, exceed theenergy threshold ET, wherein I lies between three and seven. It isfurther established whether the energy contents of n successive signalsections, which are represented by low-energy numerator-count dataNCL_(POST) and which follow the high-energy signal sections, fall belowthe energy threshold ET, wherein n is equal to or greater than thirty.This hypothesis, which can be applied to the audio signals ASI shown inFIG. 3 and FIG. 4, can be formulated mathematically in accordance withthe condition specified below:(NCL_(PRE)>=9) AND(3<=NCH<=7)AND(NCL_(POST)>=30).

For the situation shown in FIG. 4, noise-signal detection means 6 isdesigned to detect repeatedly occurring noise signals PS1, PS2 and PS3respectively. Here, the hypothesis is expanded to the effect that it isestablished whether, following on from high-energy signal sections, forexample, following signal sections SAS of noise signal SP1 or noisesignal SP2, during thirty signal sections following these high-energysignal sections, which exhibit an energy content below the energythreshold, the energy contents of further high-energy signal sections,as is the case during, for example, noise signals PS2 and PS3, exceedthe energy threshold ET. This hypothesis, thus expanded, which can beapplied to the audio signals ASI shown in FIG. 3 and FIG. 4, can beformulated mathematically in accordance with the condition specifiedbelow: (NCL_(PRE) >  = 9)${AND}\left( {{\sum\limits_{i = 1}^{3}{NCH}_{\quad i}}>=3} \right)$AND(NCH_(i) <  = 7, ∀i, 1 <  = i <  = 3)AND(NCL_(POST, i) < 30, ∀i, 1 <  = i <  = 2) AND(NCL_(POST, 3) >  = 30).

For clarification, it should be mentioned at this point that NCH₁represents the quantity of high-energy signal sections during noisesignal PS 1, and that NCH₂ represents the quantity of high-energy signalsections during noise signal PS2, and that NCH₃ represents the quantityof high-energy signal sections during noise signal PS3. It shouldfurther be clarified that NCL_(PRE) represents the quantity of signalsections SAS preceding the NCH₁ high-energy signal sections. It shouldfurther be clarified that NCL_(POST, 1) represents the quantity ofsignal sections SAS occurring between noise signals PS1 and PS2, andthat NCL_(POST, 2) represents the quantity of signal sections SASoccurring between noise signals PS2 and PS3, and that NCL_(POST, 3)represents the quantity of signal sections occurring after noise signalPS3, which exhibit an energy content below energy threshold ET. Itshould also be clarified that i represents the quantity of noise signalsPS1, PS2 and PS3 within the pause, and that the quantity of furthernoise signals PS2 and PS3 etc. occurring after the first noise signalPS1 is limited to twenty five. It should, however, be mentioned that imay also be assigned to a different maximum value, depending on theparticular application case.

To this end, energy-contents evaluation means 12 andoccurrence-detection means 13 are realized by a so-called “StateMachine”, which is designed to check continuously, on the basis of theenergy-contents data EVD and the energy-threshold data ETD and the twonumerator-count data NCL and NCH, the above-cited conditions and,depending on the results of this check, to remain in one of its statesor to change its state. The states hereby essentially represent the factthat the above-described hypothesis obtains or that this hypothesis doesnot obtain. In the event that the hypothesis obtains, a furtherdistinction is made between three further states, wherein one staterepresents a before-pause, formed by background signal BG, before thefirst noise signal PS1, and wherein a further state represents thehigh-energy signal sections during one of noise signals PS1, PS2, PS3,and wherein a further state represents intermediate pauses, formed bybackground signal BG, between time marks SE1 and SB2 or SE2 and SB3, oran after-pause between time marks SE3 and PE. The state machine isdesigned to generate and deliver numerator signal NE in the event thatit remains in a state. The state machine is further designed to generateand deliver the occurrence-detection data RD as the result of a changein state, if the above-mentioned conditions for detection of a clickingnoise signal PS1 or a sequence of clicking noise signals PS1, PS2, PS3are wholly fulfilled, and, in the event that it is not a clicking noisesignal detectable in accordance with the hypothesis, to generate anddeliver the numerator-reset signal NR.

In accordance with the above information, a method for the detection ofnoise signal PS1 in accordance with FIG. 3 or a sequence of noisesignals PS1, PS2, PS3 in accordance with FIG. 4 can be implemented indigital audio signal EAS with the aid of device 1, wherein digital audiosignal EAS is divided into successive signal sections SAS, and whereinthe energy contents of successive signal sections SAS are determined,and wherein the energy contents of a signal section SAS referred to anenergy threshold ET are evaluated, and wherein the occurrence of atleast one high-energy signal section exhibiting an energy content abovethe energy threshold ET and the occurrence of at least one signalsection SAS preceding the at least one high-energy signal section andexhibiting an energy content below energy threshold ET, and theoccurrence of at least one signal section SAS following the at least onehigh-energy signal section and having an energy content below the energythreshold ET is detected, and a quantity of signal sections SAS thatprecede the at least one high-energy signal section and a quantity ofhigh-energy signal sections and a quantity of signal sections SAS thatfollow the high-energy signal section are counted.

Below, the method M for the detection of noise signal PS1 or noisesignals PS1, PS2 and PS3, which can be implemented with the aid ofdevice 1, is explained in detail with reference to the flowchart shownin FIG. 7.

To this end, the variables specified below, which are necessary fordetection of noise signals PS1 or PS1, PS2 and PS3, are firstlyintroduced, the values of which are amended during implementation ofmethod M. A first variable E represents the energy contents of theparticular signal section SAS. A second variable CL represents thequantity of high-energy signal sections, wherein this quantitycorresponds to the high-energy numerator-count data NCH. The thirdvariable SL represents the quantity of signal sections SAS, the energyvalue of which lies below the energy threshold ET, wherein this quantitycorresponds to the low-energy numerator-count data NCL. A fourthvariable CLACCU represents an accumulated quantity of high-energy signalsections in the event that, during a pause, individual high-energysignal sections or groups of these high-energy signal sectionsrepeatedly occur. A fifth variable RC represents a repetition numeratorvalue for counting the quantity of repeatedly occurring noise signalsPS2 or PS3. A sixth variable SLMAYBERESET represents a logic value forreaching a decision. At the start of method M, the numerical variablesCL, SL, CLACCU and RC are assigned a value of zero. The logic variableSLMAYBERESET is assigned the logic value False.

The parameters specified below, which are used in the method forsequence control, are also introduced. A first parameter SBEGINrepresents the minimum quantity of signal sections SAS that representthe background signal BG before the first high-energy signal sectionoccurs, wherein, in the present case, the value nine is provided. Asecond parameter SEND represents the minimum quantity of signal sectionsSAS that represent the background signal BG and occur after the lasthigh-energy signal section belonging to a noise signal PS1 or a sequenceof noise signals PS1, PS2 or PS3, wherein, in the present case, thevalue thirty is provided. A third parameter CMIN represents the minimumquantity of high-energy signal sections required for detection of anoise signal PS1, PS2 or PS3, wherein, in the present case, the valuethree is provided. A fourth parameter CMAX represents the maximumquantity of high-energy signal sections required for detection of anoise signal PS1, PS2 or PS3, wherein, in the present case, the valueseven is provided. A fifth parameter MAXREP represents the maximumpermitted quantity of repeatedly occurring high-energy signal sections,wherein, in the present case, the value twenty-five is provided.

The implementation of method M for every signal section SAS starts at ablock M1 as soon as the digital audio signal EAS has been divided intosuccessive signal sections SAS, and the energy contents have beendetermined for the particular signal section SAS and are represented bythe variable E, and the energy threshold ET applicable in the particularcase is available.

At a block M2, evaluation takes place of the energy contents of theparticular signal section SAS referred to energy threshold ET. In theevent that it is established at block M2 that E lies below energythreshold ET, progression is to a block M3, which means that a signalsection SAS representing background signal BG in the pauses between timemarks PB and PE is present.

At block M3, a check is made as to whether CL is greater than zero.

In the event that CL is not greater than zero, this means that a pausehas been initiated. Progression in this case is to a block M5, at whichSL is increased by a value of one. Method M is then terminated at ablock M6.

In the event that CL is greater than zero, this means that a pause aftera noise signal detectable in accordance with the hypothesis is involved,for which noise signal at least one of the conditions of the hypothesisexists. Progression in this case is to a block M7, at which a check ismade as to whether SLMAYBERESET is equal to False.

In the event that SLMAYBERESET is equal to True, this means that thefirst signal section SAS after a noise signal PS1, PS2 or PS3,detectable in accordance with the hypothesis, between time marks SE1 andSB2, SE2 and SB3 or SE3 and PE may be involved. Progression is to ablock M9, at which SL is assigned the value zero in order to enable therecounting of signal sections SAS in the next pause. Subsequently, themethod is continued at a block M10, at which SLMAYBERESET is assignedFalse. Subsequently, method M is continued at block M8.

In the event that SLMAYBERESET is equal to False, meaning that a signalsection SAS other than the first one of the pause between time marks SE1and SB2, SE2 and SB3 or SE3 and PE is involved, progression is to ablock M8, at which SL is increased by the value of one.

After block M8, method M is continued at a block M11, at which a checkis made as to whether SL is equal to SEND. In the event that thisdiscontinuation condition is not fulfilled, progression is to block M6.In the event that SL is equal to send, progression is to a block M12, atwhich CLACCU is increased by the value of CL. After block M12, themethod is continued at a block M13.

At block M13, a check is made as to whether CLACCU is less than CLMIN.

In the event that CLACCU is not less than CLMIN, this means that a noisesignal PS1 or a noise-signal sequence PS1, PS2 and PS3 has beendetected, and progression is to a block M14. At block M14, theoccurrence-detection data RD is generated and delivered. Subsequently,method M is continued at a block M15, at which CL, CLACCU and RC areassigned the value of zero and at which SLMAYBERESET is assigned thevalue False. The method then ends at block M6.

In the event that CLACCU is less than CMIN, method M is continued atblock M15.

If it is the case at block M2 that E is not less than ET, this meansthat a signal section SAS that represents either a speech signal SP or anoise signal PS1, PS2 or PS3 is present. In this case, progression is toa block M4.

At block M4, a check is made as to whether CL has a value of zero andwhether SL is less than SBEGIN.

In the event that the check condition is fulfilled at block M4, thismeans that the pause during which background signal BG was present wasnot long enough, and that the signal section SAS is not a noise signalPS1, PS2 or PS3 detectable in accordance with the hypothesis. In thiscase, progression is to a block M16, at which SL is assigned the valuezero. Method M is then continued at block M16. The continuation ofmethod M in accordance with blocks 15 and 16 corresponds to thegeneration of the numerator-reset signal NR.

In the event that the check condition is not fulfilled at block M4, thismeans that a noise signal PS1, PS2 or PS3 detectable in accordance withthe hypothesis may be involved. As a result, progression is to a blockM17.

At block M17, a check is made as to whether CL is greater than zero andwhether SLMAYBERESET is equal to False.

In the event that the check condition is fulfilled at block M17, thismeans that, with regard to signal section SAS, this may be the start ofone of the noise signals PS2 or PS3, and progression is to a block M18.

At block 18, a check is made as to whether RC is less than MAXREP.

In the event that RC is not less than MAXREP, this means that a validnoise signal PS2 or PS3, i.e. one that can be detected in accordancewith the hypothesis, is not involved, and progression is to block M16.

In the event that RC is less than MAXREP, this means that one of thenoise signals PS2 or PS3 following after the first noise signal PS1 maybe involved, and progression is to a block M19. At block 19, RC isincreased by the value of one, and method M is continued at a block M20.At block M20, CLACCU is increased by the value CL, and method M iscontinued at a block M21. At block M21, CL is assigned the value of one,and the method is continued at a block M22. At block M22, SLMAYBERESETis assigned logic value True, and the method is terminated at M6.

In the event that the check condition is not fulfilled at block M17,this means that, with regard to signal section SAS, this may be thestart of the first noise signal PS1, or a signal section SAS within oneof the noise signals PS1 or PS2 or PS3 may be involved, wherein this isnot the first signal section SAS of one of noise signals PS1 or PS2 orPS3. In this case, progression is to a block M23. At block M23, CL isincreased by the value of one, and method M is continued at a block M24.At block M24, SLMAYBERESET is assigned the logic value True, and methodM is continued at a block M25.

At block M25, a check is made as to whether CL is greater than CMAX. Inthe event that CL is greater than CMAX, this means that the duration ofthe high-energy signal sections was too long, and therefore no noisesignal PS1 or PS2 or PS3 can be present, and progression is to blockM16. In the event that CL is not greater than CMAX, progression is toblock M6, and method M is terminated at block M6.

In conclusion, it should be mentioned in connection with method M that,if the condition is fulfilled at block M3, the duration of thebefore-pause before a noise signal was long enough, and that thequantity of high-energy signal sections was not greater than CMAX, andthat the quantity of repeatedly occurring noise signals lies within thepermitted range.

The device 1 shown in FIG. 1 is further equipped with supply means 14,which is designed to supply and deliver a noise-signal-free audio signalDASO, taking account of an individual detected noise signal PS1 or asequence of multiple detected noise signals PS1, PS2 and PS3. To thisend, supply means 14 is equipped with second storage means 15, which isdesigned for the temporary storage of a multiplicity of signal sectionsSAS that can be generated with the aid of the audio-signal subdivisionmeans 7. The supply means 14 is further equipped with resetting means16, which is designed to receive the occurrence-detection data RD. Theresetting means 16 is further designed for the purpose of reading thetemporarily stored signal sections SAS for read access to the secondstorage means 15. Resetting means 16 is further designed to reset thesignal sections SAS containing noise signals PS1 or PS1, PS2 and/or PS3that can be identified with the aid of the occurrence-detection data RD,and for lining up, without omissions, the remaining signal sections SAS,as a result of which a digital noise-signal-free audio signal DASO isformed. It should be mentioned in this context that the supply means 14may also be designed to replace the signal sections SAS containing noisesignals PS1 and/or PS2 and PS3. It may, for example, be provided thatthese signal sections SAS are replaced with signal sections SASrepresenting a zero signal, the signal level of which representssilence. It may further be provided, for example, that these signalsections SAS are replaced with signal sections having an artificiallygenerated background signal.

Device 1 is further equipped with delivery means 17, which is designedto receive the noise-signal-free audio signal DASO and, using thenoise-signal-free signal DASO, to generate and deliver an acoustic audiosignal ASO.

Device 1 is further equipped with interface means 18, which is designedto receive the decompressed audio signal EAS and to deliver thedecompressed audio signal

EAS in the form of an electrical signal to an appliance, not shown inFIG. 1, that can be connected to the interface means 18.

Device 1 is further equipped with control means 19, which is designed tocontrol the reception means 2, the compression means 3, thedecompression means 5, the noise-signal detection means 6, the supplymeans 14, the delivery means 17 and the interface means 18. To this end,control means 19 is connected to the means 2, 3, 5, 6, 14, 17 and 18.Control means 19 is further designed to generate a control signal CS andto deliver this control signal CS to the means 2, 3, 5, 6, 14, 17 and18. In the present case, control means 19 is designed to receive acontrol information that can be fed to it in a manual manner by means ofa user operation. It should, however, be mentioned that control means 19may also be designed to receive a control information that can be fed inby means of an infrared signal or a radio-frequency signal.

This gives rise to the advantage that, in the case of device 1, adetection and elimination of clicking noise signals PS1 or PS2 and PS3that are disturbing to a user of device 1 can be undertaken in areliable manner.

FIG. 6 shows a data processing system 18, equipped with a computer 19and a monitor 20 connected to computer 19, which serves as the visualuser interface, and a mouse 21 connected to computer 19, and a keyboard22 connected to computer 19, wherein the mouse 21 and the keyboard 22serve as a manual user interface. Computer 19 can be operated with theaid of the user interfaces by a user not shown in FIG. 6. Computer 19 isfurther equipped with an internal memory 23, which is provided for thestorage of processing data and/or of program data. The computer isfurther equipped with a processor unit 24, which is designed to interactwith memory 23 and with the aid of which processor unit, using theprogram data, which program data can be implemented with the aid of theprocessor unit, the processing data can be processed. For the purpose ofcontrolling computer 19, processor unit 24 is further designed tointeract with the user interfaces 20, 21 and 22. Computer 19 is furtherequipped with a program data/processing data interface 25, with the aidof which access can be had to a computer-readable medium 26, which, inthe present case, is realized by a compact disk, or CD for short.

Further connected to computer 19 is a mobile dictation machine 28, whichis similar to the device 1 shown in FIG. 1, wherein the mobile dictationmachine 28 is not equipped with the supply means 14 and the noise-signaldetection means 6 of device 1, and wherein the digital, decompressedaudio signal EAS can be fed directly to delivery means 17. It is furtherprovided that, with the aid of an audio-signal interface of computer 19not shown in FIG. 6, the digital audio signal EAS can be fed to computer19 and can be stored in memory 23, so further processing with the aid ofprocessor unit 24 is enabled.

To this end, a computer program product 27 can be fed to computer 19 viaits program data/processing data interface 25 with the aid of the medium26. Computer program product 27 can be loaded directly into the memory23 of computer 19, and comprises software code sections, which softwarecode sections form at least parts of the program data, wherein themethod M can be implemented with computer 19 in order to detect thenoise signal PS1 in accordance with FIG. 3 or the noise signals PS1, PS2and PS3 in accordance with FIG. 4 in the digital audio signal EAS ifcomputer program 27 is implemented on computer 19 with the aid ofprocessor unit 24.

This gives rise to the advantage that, both in the case where thedigital audio signal EAS is further processed on computer 19 with theaid of voice recognition software and in the case where the audio signalEAS is to be reproduced with the aid of computer 19, a reliabledetection of a noise signal PS1 or, if applicable, PS2 or PS3 in thedigital time representation of audio signal ASI is ensured.

It should be further mentioned that, in the case of device 1, thenoise-signal detection means 6 and, if applicable, the supply means 14may be provided between the reception means 2 and the compression means3.

It should be further mentioned that, in the case of device 1, the means6 and 14 may be provided between the means 5 and 18, so datarepresenting a noise-signal-free, decompressed audio signal EAS can bedelivered from device 1.

It should be further mentioned that medium 26 may be formed by a DVD orby an exchangeable hard disk or by a diskette.

It should be further mentioned that, in the case of device 1, at leastcomponents of the means 2, 17, 19 and 18 and the means 3, 4, 5, 6 and 14are preferably realized as an integrated circuit.

It should be further mentioned that, in the case of the noise-signaldetection means 6, a processing of signal sections SAS that are directlyadjacent to one another, or of signal sections SAS that are not directlyadjacent to one another may take place.

It should be further mentioned that the noise-signal detection means 6may be equipped with third storage means 10, shown with broken lines inFIG. 2, for the temporary storage of data EVD and ETD, and that theenergy-contents evaluation means 12 and the occurrence detection means13 may be designed for accessing the stored data EVD and ETD and forprocessing this data EVD and ETD, as a result of which a non-realtimedetection of noise signals is enabled.

It should be further mentioned that the noise-signal detection means 6may also be designed for the dynamic division of audio signal EAS intosignal sections SAS of differing signal-section durations in a rangebetween two milliseconds and ten milliseconds as a function ofproperties of audio signal EAS.

1. A method (M) to detect a noise signal (PS1, PS2, PS3) in a digitalaudio signal (EAS), wherein: the digital audio signal (EAS) is dividedinto successive signal sections (SAS); the energy contents of successivesignal sections (SAS) are determined; the energy contents of a signalsection (SAS) are evaluated in relation to an energy threshold (ET); theoccurrence of at least one high-energy signal section having an energycontent above the energy threshold (ET), and the occurrence of at leastone signal section (SAS) preceding the at least one high-energy signalsection and having an energy content below the energy threshold (ET),and the occurrence of at least one signal section (SAS) following the atleast one high-energy signal section and having an energy content belowthe energy threshold (ET) are detected; and a quantity of signalsections (SAS) that precede the at least one high-energy signal section(SAS) and a quantity of high-energy signal sections and a quantity ofsignal sections (SAS) that follow the high-energy signal section arecounted.
 2. A method (M) as claimed in claim 1, wherein: the energycontents of a signal section (SAS) are determined in accordance with theformula${E = {10{\log_{10}\left( {\frac{1}{N}{\sum\limits_{k = 1}^{N}S_{k}^{2}}} \right)}}};$S_(k) represents the signal amplitudes within the signal section (SAS),and wherein N represents the total quantity of signal amplitudes withinthe signal section (SAS).
 3. A method (M) as claimed in claim 1, whereinthe energy threshold (ET) is determined continuously from the digitalaudio signal (EAS) on the basis of a histogram method applied to theenergy contents of the signal sections (SAS), taking account of aquickly changing background level and with the aid of a ratio between auseful-signal level and a noise level of the audio signal (EAS).
 4. Amethod (M) as claimed in claim 1, wherein the signal sections (SAS)exhibit a signal-section duration (P) of between two milliseconds andten milliseconds.
 5. A method (M) as claimed in claim 1, wherein each ofthe signal sections (SAS) exhibits a signal-section duration (P) of fivemilliseconds.
 6. A method (M) as claimed in claim 1, wherein: it isestablished whether the energy contents of 1 successive high-energysignal sections exceed the energy threshold (ET), wherein 1 lies between3 and 7; it is established whether the energy contents of m successivesignal sections (SAS) preceding the high-energy signal sections fallbelow the energy threshold (ET), wherein m is equal to or greater than9; and it is established whether the energy contents of n successivesignal sections (SAS) following the high-energy signal sections fallbelow the energy threshold (ET), wherein n is equal to or greater than30.
 7. A method (M) as claimed in claim 1, wherein: it is establishedwhether, subsequent to high-energy signal sections, during signalsections (SAS) following these high-energy signal sections, whichexhibit an energy content below the energy threshold (ET), furtherhigh-energy signal sections follow; and the quantity of high-energysignal sections and the quantity of signal sections (SAS) which followthe further high-energy signal sections are counted.
 8. A device (1) toprocess a digital audio signal (EAS), which is equipped withnoise-signal detection means (6), which are designed to detect a noisesignal (PS1, PS2, PS3) in the audio signal (EAS), wherein: audio-signalsubdivision means (7), which are designed to subdivide the audio signal(EAS) into successive signal sections (SAS), are provided;energy-contents determination means (8), which are designed to determinethe energy contents of successive signal sections (SAS), are provided;energy-contents evaluation means (12), which are designed to evaluatethe energy contents of a signal section (SAS) in relation to an energythreshold (ET), are provided; and occurrence detection means (13), whichare designed to detect the occurrence of at least one high-energy signalsection having an energy content above the energy threshold (ET), and todetect the occurrence of at least one signal section (SAS) preceding theat least one high-energy signal section and having an energy contentbelow the energy threshold (ET), and to recognize the occurrence of atleast one signal section (SAS) following the at least one high-energysignal section and having an energy content below the energy threshold(ET) are provided, and wherein counting means (11), which are designedto count a quantity of signal sections (SAS) that precede the at leastone high-energy signal section and to count a quantity of high-energysignal sections and to count a quantity of signal sections (SAS) thatfollow the at least one high-energy signal section, are provided.
 9. Adevice (1) as claimed in claim 8, wherein supply means (14), which aredesigned to supply a noise-signal-free audio signal (DASO), takingaccount of the detected noise signal (PS1, PS2, PS3), are provided. 10.A computer program product (27), which can be loaded directly into amemory (23) of a computer (19), and comprises software code sections,wherein the method (M) in accordance with claim 1 can be implementedwith the computer (19) when the computer program product (27) isimplemented on the computer (19).
 11. A computer program product (27) asclaimed in claim 10, wherein the computer program product (27) is storedon a computer-readable medium (26).
 12. A computer (19) with a processorunit (24) and an internal memory (23), which implements the computerprogram product (27) as claimed in claim 10.